A New Decision Tree Induction Using Composite Splitting Criterion
نویسندگان
چکیده
C4.5 algorithm is the most widely used algorithm in the decision trees so far and obviously the most popular heuristic function is gain ratio. This heuristic function has a serious disadvantage – towards dealing with irrelevant featured data sources. The hill climbing is a machine learning technique used in searching. It has good searching mechanism. Considering the relationship between hill climbing and greedy searching, it can be used as the heuristic function of decision tree, in order to overcome the disadvantage of gain ratio. This paper proposes a composite splitting criterion equal to a greedy hill climbing approach and gain ratio. The experimental results shown that the proposed new heuristic function can scale up accuracy, especially when processing high dimension datasets.
منابع مشابه
Comparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کاملInduction of Multivariate Decision Trees by Using Dipolar Criteria
A new approach to the induction of multivariate decision trees is proposed. A linear decision function (hyper-plane) is used at each non-terminal node of a binary tree for splitting the data. The search strategy is based on the dipolar criterion functions and exploits the basis exchange algorithm as an optimization procedure. The feature selection is used to eliminate redundant and noisy featur...
متن کاملSeparability of Split Value Criterion with Weighted Separation Gains
An analysis of the Separability of Split Value criterion in some particular applications has led to conclusions about possible improvements of the criterion. Here, the new formulation of the SSV criterion is presented and examined. The results obtained for 21 different benchmark datasets are presented and discussed in comparison with the most popular decision tree node splitting criteria like i...
متن کاملFinding Multivariate Splits in Decision Trees Using Function Optimization
We present a new method for top-down induction of decision trees (TDIDT) with multivariate binary splits at the nodes. The primary contribution of this work is a new splitting criterion called soft entropy, which is continuous and differentiable with respect to the parameters of the splitting function. Using simple gradient descent to find multivariate splits and a novel pruning technique, our ...
متن کاملAvoiding the Look-Ahead Pathology of Decision Tree Learning
Most decision-tree induction algorithms are using a local greedy strategy, where a leaf is always split on the best attribute according to a given attribute selection criterion. A more accurate model could possibly be found by looking ahead for alternative subtrees. However, some researchers argue that the look-ahead should not be used due to a negative effect (called ―decision tree pathology‖)...
متن کامل